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ABSTRACT 

Due to its attractive characteristics in terms of performance, 
weight and power consumption, NAND flash memory be- 
came the main non volatile memory (NVM) in embedded 
systems. Those NVMs also present some specific character- 
istics/constraints: good but asymmetric I/O performance, 
limited lifetime, write/erase granularity asymmetry, etc. 

Those peculiarities are either managed in hardware for 
flash disks (SSDs, SD cards, USB sticks, etc.) or in soft- 
ware for raw embedded flash chips. When managed in soft- 
ware, flash algorithms and structures are implemented in a 
specific flash file system (FFS). In this paper, we present 
a performance study of the most widely used FFSs in em- 
bedded Linux: JFFS2, UBIFS,and YAFFS. We show some 
very particular behaviors and large performance disparities 
for tested FFS operations such as mounting, copying, and 
searching file trees, compression, etc. 

Categories and Subject Descriptors 

D.4.3 [Operating Systems]: File System Management; 
D.4.2 [Operating Systems]: Storage Management — Sec- 
ondary Storage] E.5 [Files]: Organization/Structure; D.4.8 
[Operating Systems]: PerformanceMeasurements 

Keywords 

NAND flash memory, Embedded storage, Flash File Sys- 
tems, I/O Performance, Benchmarking 

1. INTRODUCTION 

NAND and NOR flash are the most common types of 
flash memories [12]. NOR Flash memory provides good read 
performance and random byte access at the cost of slow write 
operations and low data densities. It is suitable for code 
storage and execution and is used as a replacement of DRAM 
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is some mobile appliances. NAND flash memory is dedicated 
to data storage, it provides more balanced read and write 
performance (even though asymmetric) and a higher data 
density, at a lower cost. It is used as the main secondary 
storage for many embedded systems. In this paper we are 
only concerned with NAND flash memories (designated as 
flash memory in the rest of the paper). 

Data in flash memory is organized hierarchically : a chip 
is divided into planes, themselves divided into blocks. Blocks 
are composed of pages. Finally, pages can be divided into 
user-data space, and a small meta-data section called the 
out- of -band area. Today, flash blocks typically contain blocks 
with 64 pages. These page size is generally 2048 bytes [12] . 

Flash memory supports 3 operations : read and write op- 
erations performed at a page level, and the erase opera- 
tion on an entire block. As flash memory provides many 
benefits, it also comes with specific drawbacks due to its 
internal intricacies. First, the erase-before- write limitation 
which imposes a costly block erase operation before writing 
a data. The consequence of this constraint is the inabil- 
ity to achieve efficient in-place data modification. The other 
very important drawback is the limited lifetime. A flash 
memory cell can sustain a limited number of erase opera- 
tions, above which it can no more retain data. Typically, a 
NAND flash cell has an endurance estimated between 10 4 
and 10 5 write/erase cycles 5 . Moreover, NAND flash cells 
can leave factory with faulty cells. Flash management al- 
gorithm partly resolve the problem by providing spare cells 
and implementing prevention mechanisms. 

Flash memory constraints require a specific management. 
The erase-before-write rule is bypassed by writing new ver- 
sions of data to other locations and invalidating old ones. 
Such a mechanism must involve a garbage collector that pe- 
riodically recycles invalidated data into free space. More- 
over, it is necessary to evenly distribute the write/erase cy- 
cles (wear out) over the whole flash memory in order to 
maximize its lifetime. This is called Wear leveling. 

Specific flash memory algorithms and structures can be 
managed through a hardware layer called the Flash Trans- 
lation Layer (FTL) [5j[7]. The FTL layer implements the 
wear leveling and garbage collection algorithms and is used 
in Solid State Drives (SSD), compact flash, USB sticks, etc. 
Performance of FTLs is a very challenging topic and sev- 
eral studies have been done in the domain. Flash mem- 
ory algorithms and structure are generally implemented in 



software when the flash is integrated into an embedded sys- 
tem. In embedded Linux operating system, this is performed 
through a dedicated FFS [3j [l2j [2j [9] . 

Embedded Linux market explosion lead us to more se- 
riously consider FFS performance issues. In our opinion, 
too few studies were done on the performance of those very 
widely used FFS. This paper is an attempt to partially fill 
this gap by presenting a testing methodology and results on 
most used FFSs. This work focuses on specific file system 
operations such as mounting, copying file trees, file searches, 
and file system compression. We do not consider flash spe- 
cific operations such as sequential and random read/ write 
operations and some specific garbage collection and wear 
leveling tests which will be considered in a future study. 

In the next section we present some related work about 
FFS performance evaluation. Then, we roughly explain 
FFSs features, and provide some examples. Next we de- 
scribe the benchmarking methodology, and then we discuss 
the results before concluding. 

2. RELATED WORK 

In g], the authors compared JFFS2, YAFFS2 and UBIFS. 
Analyzed metrics are file system mount times, system per- 
formances (using the Postmark benchmark), memory con- 
sumption, wear leveling and garbage collection cost. They 
conclude that the choice of an FFS can be motivated by 
hardware constraints, particularly available flash size. In 
[10] the authors compared the performance of JFFS2, YAFFS, 
UBIFS and SquashFS, a read-only file system which is not 
dedicated to flash memory. It is stated that UBIFS is the 
way to go in case of large flash chips. For lower sizes, JFFS2 
is favored as compared to YAFFS 2. 

A performance evaluation of UBI and UBIFS is provided 
in [II]. The authors identified very specific weaknesses of 
the file system about mount time and flash space overhead. 
In E], benchmarks are performed on JFFS2, YAFFS2 and 
UBIFS, comparing mount time, memory consumption and 
I/O performance on various kernel versions from 2.6.36 to 
3.1. 

Our study differs from the above related work in that it 
gives more details on the performance behaviors of FFS on 
the mounting and compression performance, in addition to 
new operation benchmarking (file tree copying, file search, 
and compression). In our point of view, FFS performance 
evaluation can be split into two parts: 1) the performance 
of the FFS and how it accesses to its meta data (that can 
be stored in the flash itself), and 2) the performance of ac- 
cessing the flash memory throughout the chosen FFS by 
applying different I/O workloads. In this paper, we focus 
only on the first part. 

3. LINUX FLASH FILE SYSTEMS 

In this section we briefly present three recent NAND FFSs. 
Before going into further details about Linux FFSs, we in- 
troduce the encompassing software architecture. 

3.1 A layered organization in the kernel 

The location of FFSs into the kernel is depicted on Figure 
[I] We can identify several software layers : VFS, the various 
FFSs layers, and the Memory Technology Device (MTD). 

The high-level Virtual File System layer (A on Figure [II is 
located above the FFS one. VFS allows different file systems 
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Figure 1: FFS into the linux kernel layers. 



to coexist, and present the same interface to the user level. 
This implies that each file system written for Linux must be 
compliant with the VFS interface. VFS uses memory struc- 
tures created on demand : when needed, VFS asks the file 
system (B on Figure [1} to build the corresponding object. 
VFS also maintains cache systems to accelerate I/O opera- 
tions. At a lower level, the FFS must be able to perform raw 
flash I/O operations. The NAND driver is provided by the 
MTD T layer (C on Figure [I} , a generic subsystem which 
role is to provide a uniform interface to various memory de- 
vices, comprising NAND and NOR flash memory. 

3.2 FFSs algorithms and structures 

Linux FFSs studied in this paper are depicted in Figure 

□ 

There are some common features to those FFS: 1) for in- 
stance, the data compression is supported by most of them. 
Compression not only reduces the size of the data on the me- 
dia, but it also reduces I/O load when performed on the fly 
(i.e. at runtime) at the expense of an increase of CPU load 
(for compression/decompression). 2) Bad block management 
service is also provided by all FFSs. Bad blocks are generally 
identified using a specific marker in the out-of-band area, 
and never used. 3) All the FFSs provide wear leveling and 
garbage collection mechanisms. 4) Finally, some FFS also 
provide journaling capabilities, which consists in writing in 
a journal the description of each file system modifications 
before performing the modification itself. The purpose of 
this service is to keep a valid data version to use in case 
of a system crash. The modification is therefore performed 
out-of-place, and validated on the journal once completely 
performed. UBIFS, for instance, is a journaled FFS. 

3.2.1 JFFS2 

The Journaling Flash File System version 2 (JFFS2) [l2] 
is today's most commonly used FFS. It has been present in 
the kernel mainline since Linux 2.4.10 (2001). The storage 
unit for data/meta-data is the JFFS2 node. A node rep- 
resents a file, or a part of a file, its size varies between a 
minimum of one flash page and a maximum of half an erase 
flash block. At mount time, JFFS2 has to scan the entire 
partition in order to create a direct mapping table to JFFS2 
node on flash. 

JFFS2 works with three lists of flash memory blocks : 1) 
the free list is a list of blocks that are ready to be written. 
Each time JFFS2 needs a new block to overwrite a node, 
the block is taken from this list. 2) the clean list contains 
blocks with valid nodes, and 3) the dirty list contains blocks 
with at least one invalid node. When the free space becomes 
low, the garbage collector erase blocks from the dirty list. In 
order to perform wear leveling, JFFS2 occasionally (with a 
given probability) chooses to pick one block from the clean 



list instead (copying its data elsewhere beforehand). 

Although JFFS2 is widely used, it scales linearly accord- 
ing to the flash size from a performance point of view (more 
details on the following sections). This means JFFS2 RAM 
usage and mount time increase linearly with the size of 
the managed flash device. JFFS2 supports multiple com- 
pression algorithms comprising Zlib and Lzo. Nodes are 
(de) compressed at runtime when they are accessed. 

3.2.2 YAFFS2 

The original specifications of Yet Another Flash File Sys- 
tem (YAFFS version 2) 9 dates back to 2001. The integra- 
tion of YAFFS2 into the kernel is done with a patch. 

YAFFS 2 stores data in a structure called the chunk. Each 
file is represented by one header chunk containing meta- 
data (type, access rights, etc.) and data chunks containing 
file/user data. The size of a chunk is equal to the size of the 
underlying flash page. Chunks are written on flash mem- 
ory in a sequential way. In addition to meta-data stored in 
the header chunk, YAFFS also uses the out-of-band area of 
data chunks, for example, to invalidated updated data. Like 
JFFS2, the whole YAFFS partition is scanned at mount 
time. YAFFS does not support compression. 

Garbage collection (GC) can be performed either 1) when 
a write occurs and the block containing the old version is 
identified as completely invalid, it is then erased by GC, or 
2) when the free space goes under a given threshold, GC 
selects some blocks containing valid data, copies still valid 
chunks to another locations and erases the block. 

Like JFFS2, YAFFS2 performance and meta data size 
scales linearly according to flash memory size. 

3.2.3 UBI + UBIFS 

UBIFS 2 is integrated into the kernel mainline since 
Linux 2.6.27 (2008). UBIFS aims to solve many problems 
related to the scalability of JFFS2. UBIFS relies on an ad- 
ditional layer, the UBI layer. UBI 6 performs a logical 
to physical flash block mapping, and thus discharges UBIFS 
from the wear leveling and bad block management functions. 

While JFFS2 and YAFFS 2 use tables, UBIFS uses tree- 
based structures for file indexing. The index tree is stored 
on flash through index nodes. The tree leaves point to flash 
locations containing flash data or meta-data. UBIFS also 
uses standard data nodes to store file data. UBIFS par- 
titions the flash volume into several parts: the main area 
containing data and index nodes, and the Logical erase block 
Property Tree area containing meta-data about blocks: erase 
and invalid counters used by the GC. As flash doesn't allow 
in-place data updates, when updating a tree node, the entire 
parent and ancestors nodes are moved to another location, 
that is why it is called a wandering tree. 

In order to reduce the number of flash accesses, file data 
and meta-data modifications are buffered into the main mem- 
ory and periodically flushed on flash. Each modification of 
the file system is logged by UBIFS in order to maintain the 
file system consistency in case of a power failure. UBIFS 
supports the Lzo and Zlib compression algorithm, and pro- 
vides a favor Lzo compression option, using alternatively 
Lzo and Zlib for a more balanced CPU usage ratio. 

With the use of tree structures, UBIFS layer performance 
scales in a logarithmic way with the size of the underlying 
flash partition, while the UBI layer scales linearly [I]. 
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Figure 2: File tree generation parameters. 

4. BENCHMARKING METHODOLOGY 

We performed our benchmarking on an Armadeus APF27 
embedded board, equipped with a LMX27 CPU clocked at 
400 Mhz, 2*64 MB of RAM, and a 256 MB SLC (Single 
Level Cell) NAND flash memory chip used for secondary 
storage. The flash chip is a Micron SLC NAND flash with 
a reported read latency of 25 /xs, a write latency of 300 /xs, 
and an erase latency of 2 ms. The used Linux kernel version 
is 2.6.38.8. 

4.1 Performance metrics 

When performing complete FFSs benchmarking, one has 
to consider traditional file systems metrics : read and write 
I/O performance ; RAM usage, and its evolution according 
to the partition size, CPU usage, mount time, tolerance to 
power failures, compression etc. We have also to consider 
FFSs specific metrics, related to wear leveling, garbage col- 
lection, and bad block management. For space reasons, we 
only focus on high level file manipulations in this paper. We 
do not explicitly take into account FFS specific metrics. The 
considered metrics are: (un)mount time, operations (read, 
search and copy) on file trees, and compression impact on 
file system performances. 

Execution time measurement and file tree generation. 

We used the the gettimeof day () system call to mea- 
sure execution times. It provides a microsecond precision. 
The executed commands were launched with the help of the 
system () function. In our benchmarks we used various file 
trees to efficiently measure FFSs performance. To do so, 
we developed a file tree generation tool that can be used on 
whatever file system. We lie upon several parameters to de- 
fine the file tree as depicted in figure [2] the number of files 
per generated directory (A), the number of directories per 
generated directory (B), the size of generated files (C), and 
the file tree depth (D). The tool is very flexible as it allows to 
define those parameters with some probability distributions. 

Benchmarking scenarios. 

We wrote some shell scripts performing various file sys- 
tem related operations. We measured the execution time of 
each operation with the method presented above. For the 
operations involving file tree manipulation, we define these 
file trees with the parameters presented earlier. Here we 
present two scenarios through which many operations were 
measured. 

In the scenario 1 (SI), we first prepared several FFSs 
images, varying compression options when available. The 
images are based on the same directory tree, which is a stan- 
dard embedded Linux root file system. This tree contains 
213 directories and 1122 files. Each directory of this rootfs 
contains a average of 4,95 files and 1 directory. Average file 
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Figure 3: Compression efficiency 



size is 13 KB. Each image is mounted in a 100 MB parti- 
tion, and a recursive Is -R command is performed on the 
mount point to evaluate the FFS meta-data read operations. 
This forces VFS to ask the FFS for meta-data about file tree 
organization and file / directory names. A second Is -R is 
performed to isolate the cache effect of VFS. In the mounted 
partition we then create another file tree. The parameters 
for this file tree are a depth of 5, a number of files per gen- 
erated directory obtained by a random normal distribution 
of mean value 4 and a standard deviation of 1 (norm (4, 1)), 
a number of directories per generated directory of norm(3, 
1), and a file size of norm(1024, 64) bytes. The partition is 
then unmounted. 

In the scenario 2 (S2), we erase a 100 MB flash partition 
and create an empty file system. Next we create a file filling 
randomly the whole partition. This file is then destroyed. 
This forces the FFS to invalidate corresponding data, giving 
more representative conditions of the flash state for the rest 
of the scenario (this can be considered as a partition warm 
up). Then we create a file tree with the following parame- 
ters : a depth of 5, a variable number of files per generated 
directory, a file size of 750 bytes, and 2 directories per gen- 
erated directory. We unmount the partition, then remount 
it. Next we perform a find command in the mount point, 
searching for a file that is not present in the file tree, in order 
to have the worst conditions (search the whole meta data). 
The whole file tree is then deleted, and the file system un- 
mounted. We launched this scenario for each of the tested 
FFSs, with default compression options. 

The first scenario allows to measure the compression effi- 
ciency on the file system, the mount time, meta data read 
(throughout the Is command), file tree creation on a given 
FFS, and the unmount time. The second scenario allows 
to perform a warm-up of the flash partition (by making it 
dirty) before measuring the mount and unmount operations, 
the search on FFS meta-data (find command), and file tree 
deletion. 

5. RESULTS AND DISCUSSION 
5.1 Compression 

Compression efficiency. 

The size of SI images is presented in Figure [3] We can 
observe that compression reduces the size of the stored data 
by up to 40% in some cases for JFFS2 and UBIFS. Various 
FFS uncompressed images sizes are different because of the 
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Figure 4: Scenario 1 operations execution times 



granularity of the file write operation that is different in each 
FFS. For example, JFFS2 manages nodes which size vary 
between 1 page and half of a flash block, while YAFFS2 
always uses one page per chunk in addition to a dedicated 
page for the file header chunk (which is very space consuming 
mainly when we deal with small files). This explains the 
larger size of YAFFS image. 

Compression impact on SI. 

Figure [4] depicts the measured execution times of SI op- 
erations. Before flashing a new image we perform a full 
flash erase, except between None(l) and None(2) for both 
YAFFS and JFFS2, which represent two consecutive runs of 
the SI on the same uncompressed image. Compression re- 
duces the stored data size and also flash I/Os, thus enhanc- 
ing the overall performances (mount time, file tree creation, 
and Is command). In particular, for JFFS2's mount time 
and file tree creation time. We can also notice that UBIFS 
compression does not affect performance. 

5.1.1 File system operations 

Mount / Unmount times. 

From Figure [4] one can notice the huge execution time of 
the unmount operation for the uncompressed JFFS2 image 
in the first run. Each JFFS2 mounted partition has its own 
garbage collection thread (GC thread), in charge of recycling 
invalid blocks in the background. It is also responsible for 
formatting a recently created partition. This means that 
a newly created and mounted partition must wait for the 
format operation to finish up before being able to be un- 
mounted. In the None(l) case for JFFS2, the umount call 
should wait until the GC thread terminates before beginning 
to unmount the partition. If we repeat SI a second time on 
the same image, (the None(2) case for JFFS2), we see a very 
fast unmount of the file system because it has been already 
formatted. For the next runs of SI on following JFFS2 com- 
pressed images, we give the GC thread the time to finish 
before calling umount. The time the GC thread takes to 
perform the format operation strongly depends of the par- 
tition size. The GC thread is also responsible of the long 
file tree creation time of JFFS2 None(l), because it runs in 
background and disrupt standard file-system I/O. 
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File tree creation 
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Figure 5: File system operation execution time results: find, mount, and file tree creation time in S2. 



We did also two runs with the YAFFS2 image. One can 
notice the important mount time for the first run as com- 
pared to the second one. In fact, YAFFS2 has to scan the 
whole partition at mount time to recreate the needed meta- 
data structures in RAM. At unmount time, this set of meta- 
data can be written on flash and read during next mount, 
though speeding up the mount time. 

From S2 we found that JFFS2 and UBIFS mount times 
do not seem to strongly depend on the number of files in the 
mounted partition. JFFS2 mount time is very important 
as compared to the two others FFSs, there is approximately 
one of magnitude difference. YAFFS2's mount time seems to 
increase according to the number of files, to reach the same 
value as UBIFS at about 2 000 files (~2MB). Conversely, the 
mount time seem to highly depend on the partition size for 
both JFFS and YAFFS. This is due to the fact that both 
file systems have to scan the entire partition at mount time. 
Because of lack of space, related figure is not presented in 
this paper. 

File tree creation, deletion, and search execution time. 

Figure [5] shows find command, file tree creation and dele- 
tion execution time according to the number of files in the 
file tree. One can observe that the execution time seem to 
grow linearly according to the number of files. For the find 
command execution time, JFFS2 and UBIFS show about 
the same results, when YAFFS2 is up to two times faster 
(2.5 vs 1 second for a find command on 2 000 files). For 
file tree creation, UBIFS outperforms the two other FFSs 
when the number of created files is greater than 250. For 
smaller sets of files, YAFFS 2 gives the best results. Con- 
cerning file deletion, the best results are achieved by JFFS 2, 
as YAFFS gives 5 times poorer execution times (0.8 vs 4 
seconds). YAFFS bad results are due to the fact that the 
file system has to write a header file for each file deletion. 

6. CONCLUSION AND FUTURE WORKS 

In this paper, we presented global FFSs mechanims and 
provided current implementations examples that are JFFS 2, 
YAFFS2, and UBIFS. Our performance evaluation can be 
summarized with the following table : 
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Providing compression gives good advantage to JFFS2 
and UBIFS over YAFFS, because it reduces considerably the 



size of stored data. Regarding high level file manipulation 
operations, our tests show that UBIFS gives the best results 
when creating file trees. One of YAFFS strengths seems to 
be in file metadata search, as it outperforms the other FFSs 
by a factor of two. Regarding mount time, UBIFS gives the 
best results, thanks to the small size of the scanned jour- 
nal, and the usage of tree-based data structures. YAFFS 's 
checkpointing thechnique gives also good results in terms of 
mount time for the benchmarked partitions. 

In the future, we plan to expand our performance evalu- 
ation with additional metrics, more specifically RAM usage 
and CPU load, and their evolution according to various pa- 
rameters such as flash partition size, and file tree parame- 
ters. We also plan to measure and estimate the power con- 
sumption profiles of presented filesystems. 
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